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Abstract 



Recent developments in cognitive psychology suggest models for 
knowledge and learning that often fall outside the realm of standard test 
theory. This paper concerns probability-based inference in terms of such 
models. An approach utilizing Bayesian inference networks is outlined. 
Basic ideas of structure and computation in inference networks are 
discussed, and illustrated with an example from the domain of mixed- 
number subtraction. 
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Introduction 



The psychological paradigm emerging from cognitive psychology suggests new 
models for students' capabilities — a potentially powerful framework to plan instruction, 
evaluate progress, and provide feedback to students and teachers (Snow & Lohmann, 
1989). As in traditional test theory, however, we face problenis of inference: Just what 
kinds of things are to be said about students, by themselves or others? What evidence is 
needed to support such statements? How much faith can we place in the evidence, and in 
the statements? How do we son out elements of evidence that are overlapping, redundant, 
or contradictory? When do we need to ask different questions or pose additional situations 
to distinguish among competing explanations of what we see? 

This paper discusses a probabilistic framework for addressing questions like these. 
The essential idea is to define a space of "student models" — simplified characterizations of 
students' knowledge, skill, and/or strategies, indexed by variables that signify their key 
aspects. From theory and data, one posits probabilities for the ways that students with 
different configurations in this space will solve problems, answer questions, and so on. 
This done, the machinery of probability theory allows one to reason from observations of a 
student's actions to hkely values of parameters in a student model. 

Recent developments in statistical theory make it possible to carry out such 
inference in large and complex systems of variables. The program of research introduced 
here is beginning to explore the potential of this approach in educational assessment and 
cognitive diagnosis. By working out the details of specific illustrative examples, we are 
learning about the kinds of domains and student models that are practical to address, and 
starting to tackle an agenda of practical engineering challenges. We begin with an overview 
of inference networks, walking through a simple numerical example from medical 



V 



Probability-Based Inference 
Page 2 

diagnosis. An example fron mixed-number subtraction illustrates the features of the 
approach as applied to cognitive assessment. 

Probability-Based Inference 

Inference is reasoning from what we know and what we observe to explanations, 
conclusions, or predictions. We are alway ^ reasoning in the presence of uncertainty. The 
information we work with is typically incomplete, inconclusive, amenable to more than one 
explanation. We attempt to establish the weight and coverage of evidence in what we 
observe. But the very f^rst question we must address is "Evidence about what?" Schum 
(1987, p. 16) points out the crucial distinction between data and evidence: "A datum 
becomes evidence in some analytic problem when its relevance to one or more hypotheses 
being considered is established. . . . [Ejvidence is relevant on some hypothesis if it either 
increases or decreases the likeliness of the hypothesis. Without hypotheses, the relevance 
of no datum could be established." In educational assessment and cognitive diagnosis, we 
construct hypotheses around notions of the nature and acquisition of knowledge and skill. 

Schum distinguishes three types of reasoning, the distinctions among which are 
central to this presentation. Deductive reasoning flows from generals to particulars, within 
an established framework of relationships among variables — from causes to effects, from 
diseases to symptoms, from the way a crime is committed to the evidence hkely to be found 
at the scene, from a student's knowledge and skills to observable behavior. That is, under 
a given state of affairs, what are \ht likely outcomes? Inductive reasoning flows in the 
opposite direction, also within an established framework of relationships — from effects to 
possible causes, from symptoms to diseases, from observable behavior to probable 
configurations of a student's kriowledge and skills. Given outcomes, what state of affairs 
led to them? In abductive reasonings reasoning proceeds from observations to a new 
hypotheses, new variables, or new relationships among variables. "Such a *bottom-up' 
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process certainly appears siniilar to induction; but there is an argument that such reasoning 
is, in fact, different from induction since an existing hypothesis collection is enlarged in the 
process. Relevant evidentiary tests of this new hypothesis are then deductively inferred 
from the new hypothesis." (Schum,.l987, p.20). 

The diagnostic approach discussed in this paper consists of a network of variables 
defining the student model space, the observable-outcome space, and the interrelationships 
a- long them. All three types of reasoning play a role: 

• Abductive reasoning guides its construction, drawing upon research results and 
previous practice to suggest the basic structure and statistical analyses refine it. For 
example, Piaget (e.g., 1960) searched painstakingly for commonalities in the 
development of children's proportional reasoning abilities over years of unique learning 
episodes of individual children. Siegler's (1981) characterization of children's 
understandings of balance-beam problems as a sequence of increasingly sophisticated 
strategic flowcharts captures key aspects of some of these patterns, and provides a 
basis for a student model space (Mislevy, Yamamoto, & Anacker, 1992). 

• Deductive reasoning, supplemented by parameter estimation, is used to posit 
distributions of observable variables given configurations of variables in the student 
model. In Siegler's study, this corresponds to determining how a child with a given set 
of strategies at her disposal might attack a given balance-beam problem, in terms of 
distributions of expected classes of actions. 

• Inductive reasoning, embodied in the algebra of probability theory, guides reasoning 
from observations of a given student to inferences about her knowledge and skills, in 
terms of updated beliefs about student-model variables. This corresponds to 
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characterizing our beliefs about which balance-beain strategies a child possesses after 
seeing her responses tu a set of problems. 

• Abductive reasoning, triggered by unexpected patterns in data, is called for again by the 
results of the inductive reasoning phase. Sometimes a particular child's responses will 
not be consistent with any of the student models in the simplified framework; inductive 
reasoning within this framework fails to provide a satisfactory working approximation 
of her knowledge and skill. In such cases, we need richer data to support further 
exploration, to generate new conjectures. 

A key concept in probabihty-based inference is conditional independence: Defined 
generally, one subset of variables may be related in a population, but they are independent 
given the values of another subset of variables. In cognitive models, relationships among 
observations variables are "explained" by unobservable variables that characterize aspects 
of knowledge, skill, strategies, and so on. In Thompson's (1982) words, we ask **What 
can this person be thinking so that his actions make sense from his perspective?" or **What 
organization does the student have in mind so that his actions seem, to him, to form a 
coherent pattern?" Judah Pearl argues that creating such intervening variables is not merely 
a technical convenience, but a natural element in human reasoning: 

'\ . .conditional independence is not a grace of nature for which we must 
wait passively, but rather a psychological necessity which we satisfy 
actively by organizing our knowledge in a specific way. An important tool 
in such organization is the identification of intermediate variables that induce 
conditional independence among observables; if such variables are not in 
our vocabulary, we create them. In medical diagnosis, for instance, when 
some symptoms directly influence one another, the medical profession 
invents a name for that interaction (e.g., 'syndrome/ 'complication,' 
'pathological state') and treats it as a new auxiliary variable that induces 
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conditional independence; dependency between any two interacting systems 
is fully attributed to the dependencies of each on the auxiliary variable/' 

Pearl, 1988, p. 44. 

Conditional independence is thus a conceptual tool to structure reasoning, helping 
to define variables, organize relationships, and guide deductive reasoning. In educational 
and psychological measurement, a heritage of statistical inference built around 
unobservable variables and induced conditional probability relationships extends back to 
Spearman's (e.g., 1907) early work with latent variables, to Wright's (1934) path analysis, 
to Lazarsfeld's (1950) latent class models. The resemblance of the inference networks 
presented below to LISREL diagrams (Joreskog & Sorbom, 1989) is no accident! Our 
work shares infenmtial machinery with this tradition, but extends the universe of discourse 
to student models suggested by cognitive psychology. 

Inference Networks 

Probability-based inference in complex networks of interdependent variables is an 
active topic in statistical research, spurred by applications in such diverse areas as 
forecasting, pedigree analysis, troubleshooting, and medical diagnosis (e.g., Lauritzen & 
Spiegelhalter, 1988; Pearl, 1988). Current interest centers on obtaining the distributions of 
selected variables conditional on observed values of other variables, such as likely 
characteristics of offspring of selected animals given characteristics of their ancestors, or 
probabilities of disease states given symptoms and test results. As we shall see below, 
conditional independence relationships, as suggested by substantive theory, play a central 
role in the topology of the network of interrelationships in a system of variables. If the 
topology is favorable, such calculations can be carried out in real time in large systems by 
means of strictiy local operations on small subsets of interrelated variables ("cliques") and 
their intersections. 
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This section briefly reviews basic concepts of construction and local computation 
for inference networks. Details can be found in the statistical and expen-systenns literature; 
Lauritzen and Spiegelhalter (1988), Pearl (1988), and Shafer and Shenoy (1988), for 
example, discuss updating strategies, a kind of generalization of Bayes theorem. Computer 
programs are commercially available to cany out the number-crunching aspect. We used 
Andersen, Jensen, Olescn, and Jensen's (1989) HUGIN program and Noetic System's 
(1991) ERGO for the examples in this presentation. 

To move from a structure of interrelationships among variables to a representation 
amenable to real-time local calculation, the steps listed below are taken. The first two 
encompass defining the key variables in an application and explicating their 
iterrelationships. In essence, this information is the input to programs like ERGO and 
HUGIN, which then carry out Steps 3 through 7. 

Step 1 . Recursive representation of the joint distribution of variables. 

Step 2. Directed graph representation of (1). 

Step 3. Undirected, triangulated graph. 

Step 4. Determination of cliques and clique intersections 

Step 5. Join tree representation. 

Step 6. Potential tables. 

Step 7. Updating scheme. 

Although computer programs are available, it is useful nevertheless to walk 
through the details of simple example — to watch what hnppens inside the *'black box'' — to 
develop intuition that can guide more ambitious applications. We borrow a simple example 
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from Andreassen, Jensen, and Olesen (n.d.). It concerns two possible diseases a particular 
patient may have, flu and throat infection (FLU and THRINP), and two possible 
symptoms, fever and sore throat (FEV and SORETHR). The diseases are modeled as 
independent, and the symptoms as conditionally independent given disease states. These 
relationships are depicted in Figure 1, which will be discussed in greater detail below. All 
four variables can take values of '*yes" and ''no." We assume that exactly one value 
characterizes each variable for a patient, although we may not know these values with 
cenainty. We employ probabilities to express our states of belief. We note in passing that 
it would be possible to work with the full joint distribution of the four variables in this 
example directly, using the textbook form of Bayes theorem to update beliefs of disease 
states as symptoms become known. This approach rapidly becomes infeasible as the 
number of variables in the system increases, whereas the approach described below has 
been employed in networks with over 1000 variables (Andreassen, Woldbye, Falck, & 
Andersen, 1987). 

[[Figure 1 about here]] 

1 . A recursive representation of the joint distribution of variables 

A recursive representation of the joint distribution of a set of random variables 
X], .... Xn takes the form 

p(Xi,...,Xn) = p(XJXn-l,...,Xi) p(Xn.llXn.2,...,Xi).-.p(X2lXi) p(Xi) 

=np(XjiXj,i,...,xi), 

(1) 

where the term for j=l is defined as simply p(Xi). A recursive representation can be 
written for any ordering of the variables, but one that exploits conditional independence 
relationships can prove more useful as variables drop out of the conditioning lists. This is 
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where substantive theory comes into play; for example, modeling conditional probabilities 
of symptoms given disease states, rather than vice versa. The following representation 
exploits the independence of FLU and THRINF and the conditional independence of EEV 
and SORETHR: 

P(FEV, SORETHR, FLU, THRINF) 

= P(FEV I SORETHR, FLU, THRINF) P(SORETHR 1 FLU, THRINF) P(FLU I THRINF) P(THRINF) 

= P(FEV I FLU, THRINF) P(SORETHR I FLU, THRINF) P( FLU) P( THRINF). (2) 

Equation 2, like Figure 1, indicates the qualitative dependence structure of the 
relationships among the variables without specifying quantitative values. Constructing the 
full joint distribution from the recursive representation requires the specification of 
conditional probability distributions for each variable. For each combination of values of a 
variable's parents, this matrix gives the conditional probabilities of each of its potential 
values. Associated with variables having no parents, such as FLU and THRINF, is a 
vector of base rates or prior probabilities. We shall assign to both FLU and THRINF prior 
probabilities of . 1 1 for ''yes" and .89 for ''no." This might correspond to base rates in a 
reference population to which our patient belongs. Conditional probabilities of FEV and of 
SORETHR given all combinations of FLU and THRINF appear in Table 1. In practice, 
such probabilities would be determined by disease theory, physiological principles, and 
pjist experience. The tabled values indicate that. . . 

♦ Throat infection usually causes a sore throat whether or not flu is also present (.91 
and .90 respectively); flu alone occasionally leads to a sore throat (.05), but the 
chances of a sore throat without either flu or throat infection is only .01. 
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• Having both flu and throat infection leads almost certainly to fever (,99); either 
disease by itself leads to fever with probability ,90; and the probability of fever 
when neither disease is present is only .01, 

[[Table 1 about here]] 

The updating schemes discussed below assume these conditional probabilities are 
known with certainty. In practice, of course, they are usually not. Current research in the 
field includes characterizing the impact this source of uncenainty, sequentially improving 
estimates as additional data are obtained, and incorporating this uncertainty foraially by 
augmenting the network with variables that parameterize the extent of knowledge about 
conditional probabilities (Spiegelhalter, 1989). 

2 . A directed graph representation of the joint distribution of variables 

Corresponding to the algebraic representation of Equation 1 is a graphical 
representation — a directed acyclic graph (DAG). The graph inherits its "directedness" and 
'*acyclic" properties from the recursive expression of the distribution in Equation 1. 
Direction comes from which variables are written as conditional on others in the 
representation, and the recursive expression prohibits "cycles" such as "A depends on B, B 
depends on C, and C depends on A." Figure 1, corresponding to Equation 2, is the DAG 
for our example. Each variable is a node in the graph; directed arrows run from "parents" 
to "children/'^ indicating conditional dependence relationships among the variables. 

A DAG depicts the qualitative structure of associations among variables in the 
domain. Theory about the domain is the starting point, but a real application requires 
model-fitting, model evaluation, and model refinement. While many standard techniques 
from statistical theory are useful in this endeavor, cenain complications arise. In large 
networks, for example, many cases will be incomplete; there is no practical need to obtain 
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the results of additional detailed diagnostic tests for diseases that have already been ruled 
out. And while global tests of model fit are useful in comparing alternative models, more 
focused tests checking local features of models and verifying predictions one case at a time 
are more useful for model refinement. While the updating schemes discussed below take 
the DAG structure as given, we must keep in mind that the success of an application 
ultimately depends on the care and diought that go into developing tiiat structure. 

3. An undirected, triangulated graph 

Starting with die DAG, one drops die directions of the associations and adds edges 
as necessary to meet two requirements. First, the parents of a given child must be 
connected. Secondly, the graph must be triangulated; that is, any path of connections from 
a variable back to itself (a loop) consisting of four or more variables must have a chord, or 
'*short cut." Triangulation is necessary for expressing probability relationships in a way 
that lends itself to coherent propagation of information. Kim and Pearl's (1983) initial 
work witii individual variables showed how to carry out coherent local updating in singly 
connected networks of variables, or networks of variable associations with no loops at all. 
Most networks are not singly connected, however. Even our simple example has loops; 
for example, one can stan a path at FEVER, follow a connection to FLU, then to 
SORTHR, then to THRINF, and finally return to FEVER. 

The more recent updating schemes discussed here generalize Kim and Pearl's ideas 
by arranging variables into subsets called chques, in a way such that the cliques form a 
singly-connected graph. Generalizations of Kim and Pearl's approach can then be applied 
at the level of cliques. Triangulating the original graph of variables guarantees that a 
singly-connected clique representation can be constructed (Jensen, 1988). A triangulation 
scheme is not necessarily unique, and various algorithms have been developed to construct 
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triangulated graphs that suppon efficient calculation (e.g., Tarjan & Yannakakis, 1984). 
Figure 2 is the undirected, triangulated graph for our example. 

[[Figure 2 about here]] 

4. Determination of cliques and clique intersections 

From the triangulated graph, one determines cliques, subsets of variables that are 
all linked pairwise to one another. Cliques overlap, with sets of overiapping variables 
called clique intersections. Cliques and chque intersections constitute the structure for local 
updating. Figure 3 shows the two cliques in our example, {FEVER, FLU, THRINF} and 
{FLU, THRINF, SORTHR}. The clique intersection is {FLU, THRINF}. 

[[Figure 3 about here]] 

Just as there can be multiple ways to produce a triangulated graph from a given 
DAG, there can be multiple ways to define cliques from a triangulated graph. Algorithms 
for determining a clique structure that supports efficient calculation are also a focus of 
research. The amount of computation grows roughly geometrically with clique size, as 
measured by the number of possible configurations of all values of all variables in a clique. 
A clique representation with many small cliques is therefore preferred to a representation 
with a few larger cliques. Strategies for increased efficiency at this stage include redefining 
variables, adding variables to break loops, and dropping associations when the 
consequences are benign. 

5 . Join tree representation 

A join-tree representation depicts the singly-connected structure of cliques and 
clique intersections. This is the structure through which local updating flows. A join tree 
exhibits the running intersection property: If a variable appears in two cliques, it appeal's in 
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all cliques and clique intersections in the single path connecting them. Figure 4 gives the 
join-tree for our exanaple. 

[[Figure 4 about here]] 

6. Potential tables 

Local calculation is carried out with tables that convey the joint distributions of 
variables within cliques, or potential tables. Similar tables for clique intersections are used 
to pass updating information from one clique to another. The potential tables in Table 2 
indicate the initial status of the network for our example; that is, before specific knowledge 
of a particular individual's symptoms or disease states becomes known. For example, the 
potential table for Clique 1 is calculated using the prior probabilities of .1 1 for both flu and 
throat infection, the assumption that tiiey are independent, and tiie conditional probabihties 
of sore throat for each flu/thrpat-infection combination. 

[[Table 2 about here]] 

The initial probability for fever can be obtained by marginali2dng the potential table 
for Clique 1 with respect to flu and throat infection. This amounts to summing down the 
*'FEVER: yes'* column, yielding a value of .20. Similarly, the initial probabihty for sore 
throat is obtained by summing down the "SORTHR: yes" column in the potential table for 
Clique 2, yielding .11. 

7 • Local updating 

Absorbing new evidence about a single variable is effected by re-adjusting the 
appropriate margin in a potential table that contains that variable, then propagating the 
resulting change to the clique to other cliques via the clique intersections. This process 
continues outward from the clique where the process began, until all cliques have been 



18 



Probability-Based Inference 
Page 13 

updated. The single-connectedness and running intersection properties of the join tree 
assure that coherent probabilities result. 

Suppose that we learn the patient in our example does have a fever. How does this 
change our beliefs about the other variables? The calculations are summarized in Table 3. 

[[Table 3 about here]] 

The process begins with the potential table for Clique 1. In the initial condition, we 
had a joint probability distribution for the variables in this clique, say, fo(FEVER, 
FLU, THRINF). We now know with certainty that FEVER=yes, so the column 
for FEVER=no is zeroed out.^ Denote the updated potential table fi (FEVER, FLU, 
THRINF). One could re-normalize the entries in the FEVER=yes column at this 
point, but only the proportionality information needs to be sent on for updating. 

• The clique intersection table is updated to reflect the new proportional relationships 
among the probabilities for FLU and THRINF, or fi(FLU, THRINF). 
Normalizing them to sum to one would give probabilities, P](FLU, THRINF), 
which marginalize to .51 for FLU=yes and for THRINF=yes. 

• The potential table for Clique 2 is updated by first dividing all entries in a row by 
the value for that row in the original clique intersection table, then multiplying them 
by the corresponding entries in the new one obtained in the previous step. The 
resulting entries are proportional to the new posterior probabilities for the variables 
in Clique 2. We now examine the rationale for this step in terms of probabilities 
(but recall that it suffices within the black box to simply pass the correct information 
about proponionalities along the join tree). 
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The initial joint probability distribution for Clique 2, Po(FLU,THRINF,SORTHR), 
implied beliefs about flu and throat infection that v/ere consistent with those in the 
initial status of Clique 1. But incoming information about fever modified belief 
about flu and throat infection, to Pi(FLU,THRINF). We want to revise the 
information in the potential table for Clique 2 so that it is (1) consistent with the 
new beliefs about flu and throat infection, but (2) unchanged in terms of the 
relationship of sore throat conditional on fever and throat infection. This is • 
accomplished as shown below, justifying the divide-by-old-and-multiply-by-new 
algorithm: 

Pj(FLU,THRINF,SORTHR) 

= P(SORTHRIFLU,THRINF) Pi(FLU,THRINF) 



P(SORTHRIFLU,THRINF) Po(FLU,THRINF) 
Pq(FLU,THRINF) 



Po(FLU,THRINF,SORTHR) 
Pq(FLU,THRINF) 



Pj(FLU,THRINF) 
Pi(FLU,THRINF). 



• The entries in the Clique 2 potential table can be re-normed to sum to one, as shown 
in the final panel in Table 3, to facilitate the calculation of individual combinations 
of values or of margins. For example, the revised probability for sore throat is .48. 

Application to Cognitive Diagnosis 

The approach we are exploring begins in a specific application by defining a 
universe of student models. This ''supermodel" is indexed by parameters that signify 
distinctions between states of understanding. Symbolically, we shall refer to the (typically 
vector-valued) parameter of the student-model as T|. A particular set of values of T| 
specifies a particular student model, or one particular state among the universe of possible 
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states the supermodel can accommodate. These parameters can be qualitative or 
quantitative, and qualitative parameters can be unordered, partially ordered, or completely 
ordered. A supermodel can contain any mixture of these types. Their nature is derived 
from the structure and the psychology of the learning area, with the goal of being able to 
express essential distinctions among states of knowledge and skill . 

Any application faces a modeling problem, a task construction problem, and an 
inference problem. 

The modeling problem is delineating the states or levels of understanding in a 
learning domain. In meaningful applications this might address several distinct strands of 
learning, as understanding develops in a number of key concepts, and it might address the 
connectivity among those concepts. This substep defines the structure of p(xlri), where x 
represents observations. An interesting special case occurs when the universe of student 
models can be expressed as performance models (Clancey, 1986). A performance model 
consists of a knowledge base and manipulation rules that can be run on problems in a 
domain of interest. A particular model can contain both knowledge and production rules 
that are incorrect or incomplete; the solutions it produces will be correct or incorrect in 
identifiable ways. Here the parameter ri specifies features of performance models, such as 
the set of production rules that characterizes a student's state of competence. 

Obviously any model will be a gross simplification of the reality of cognition. A 
first consideration in what to include in the supermodel is the substance and the psychology 
of the' domain: Just what are the key concepts? What are important ways of understanding 
and misunderstanding them? What are typical paths to competence? A second 
consideration is the so-called grain-size problem, or the level of detail at which student- 
models should differ. A major factor in answering this question is the decision-making 
framework under which the modeling will take place. As Greeno (1976) points out, ''It 
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may not be critical to distinguish between models differing in processing details if the 
details lack important implications for quality of student performance in instructional 
situations, or the ability of students to progress to further stages of knowledge and 
understanding." 

An analog for the student model space is Smith & Wesson's "Identikit," which 
helps police construct likenesses of suspects. Faces differ in infinitely many ways, and 
skilled police artists can sketch infinitely many drawings to match witnesses' recollections 
(which is not the say that police anists' drawings duplicate suspects' faces perfectiy; 
uncertainty enters in the link through the witness). Departments that can't suppon an artist 
use an Identikit, a collection of various face shapes, noses, ears, hair styles, and so on, that 
can be combined to approximate witnesses' recollections from a large, but finite, range of 
possibilities. The payoff lies not in how accurately the Identikit composite depicts the 
suspect, but whether it aids the search enough to justify it- use. 

Research relevant to constructing student models has been carried out in a wide 
variety of fields, including cognitive psychology, the psychology of mathematics learning 
and science learning, and anificial intelligence (AI) work on student modeling. Cognitive 
scientists have suggested general structures such as "frames" or "schemas" that can serve 
as a basis for modeling understanding (e.g., Minsky, 1975; Rumelhan, 1980), and have 
begun to devise tasks that probe their features (e.g., Marshall, 1989, 1993). Researchers 
interested in the psychology of learning in subject areas such as proponional reasoning 
have focused on identifying key concepts, studying how they are typically acquired (e.g., 
in mechanics, Clement, 1982; in ratio and proportional reasoning, Karplus, Pulos, & 
Stage, 1983), and constructing observational settings that allow one to infer students' 
understanding (e.g., van den Heuvel, 1990; McDermott, 1984). Our approach can succeed 
only by building upon foundations of such research. 
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The task construction problem is devising situations for which students who differ 
in the parameter space are likely to behave in observably different wuys. The conditional 
probabilities of behavior of different types given the unobservablo state of the student are 
the values of p(xlri), which may in turn be modeled in terms of another set of parameters, 
say p, that have to be estimated. The p(xlT|) values provide the basis for inferring back 
about the student state. An element in x could contain a right or wrong answer to a 
multiple-choice test item; it could instead be the problem-solving approach regardless of 
whether the answer is right or wrong, the quickness of a responding, a characteristic of a 
think-aloud protocol, or an expert's evaluation of a particular aspect of the performance. 
The effectiveness of a task is reflected in differences in conditional probabilities associated 
with different parameter configurations, so a task may be very useful in distinguishing 
among some aspects of student models but useless for distinguishing among others 
(Marshall, 1989). 

The inference problem is reasoning from observations to student models. This is 
where the inference network and local computation come into play. The model-building 
and item construction steps define the relevant variables (the student-model variables T) and 
the observable variables x) and provide conditional probabilities. Let p(r|) represent 
expectations about T| in a population of interest — possibly non-informative, possibly based 
on expert opinion or previous analyses. Together, p(ri) and p(xlr|) imply our initial 
expectations for what we might observe from a student. Once we make actual 
observations, we can revise our probabilities through the network to draw inferences about 
r| given x, via p(T|lx) oc p(xlr|) p(r|). Thus p(T|lx) characterizes belief about a particular 
student's model after having observed a sample of the student's behavior. 
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Example: Mixed-Number Subtraction 

This example illustrates a model that is aimed at the level of shon-term instructional 
guidance. The form of the evidence being collected is traditional — right or wrong 
responses to open-ended mixed-number subtraction problems — but inferences are carried 
out in a student model motivated by cognitive analyses of the domain. It concerns which of 
two strategies students apply to the problems, and whether they are able to carry out 
procedures required singly or in combination in problems. Although a much finer grain- 
size can be entertained for models of these types of skills (e.g., VanLehn's 1990 analysis 
of whole number subtraction), this example incorporates the fact that whether an item is 
easy or hard to a given student depends in part on the strategy she employs. Rather than 
being discarded as noise, as it would be under standard test theory, this interaction is 
exploited by the analytic model as a source of evidence about a student's strategy usage. 

The data and the cognitive analysis upon which the student model is grounded are 
due to Kikumi Tatsuoka (1987, 1990). The middle-school students she studied 
characteristically solve mixed number subtraction problems using one of two strategies: 

Method A: Conven all whole and mixed numbers to improper fractions, subtract, then 
reduce if necessar>'. 

Method B: Separate mixed numbers into whole number and fractional parts, subtract as 
two subproblems, borrowing one from minuend whole number if 
necessary, then reduce if necessary. 

We analyzed 530 students' responses to 15 items. Table 4 shows how we 
characterized each item in terms of which of seven sut . 'ocedures would be required if it 
were solved with Method A and which would be required if it were solved with Method B. 
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The student model is comprised of a variable for which strategy a student uses and which 
of the subprocedures he is able to apply. The structure connecting the unobservable 
parameters of the student model and the observable responses is that ideally, a student 
using Method X (A or B, as appropriate to that student) would correctly answer items that 
under that strategy require only subprocedures the student has at his disposal (see 
Falmagne, 1989, Tatsuoka, 1990, and Haenel & Wiley, 1993, on models of this type). 
However, sometimes students miss items even under these conditions (false negatives), 
and sometimes they correctly answer items when they don't possess the subprocedures by 
other, possibly incorrect, means (false positives). The connection between observations 
and student model variables is thus probabilistic rather than deterministic. 

[[Table 4 about here]] 

A network for Method B 

Figure 5 is a graphic depiction of the structural relationships in an inference 
network for Method B only. Nodes represent variables, and arrows represent dependence 
relationships. The joint probability distribution of all variables can be represented as the 
product of conditional probabilities, with each variable expressed in terms of conditional 
probabilities given its "parents." Five nodes, "Skill 1" through "Skill5," represent basic 
subprocedures that a student who uses Method B might need use to solve items. 
Additional nodes, such as "Skillsl&2'' are conjunctions, representing, for example, either 
having or not having both Skill 1 and Skill 2. The node MN stands for "mixed number 
skills." It subsumes both Skill3, srparating whole numbers from fractions, and Skill4, 
borrowing a unit from a whole number; the MN node contains the logical relationship that 
Skills is a prerequisite for Skill4. All of these skill variables and their combinations are 
represented in Figure 5 by rectangles. They are the elements of the student model, or r|. 
The relationships among the skill nodes are either empirical (probabilities of having, say, 

ERIC 



Probability-Based Inference 
Page 20 

Skill 2 given that one does or does not have Skill 1) or logical (one has **Skillsl&2" only if 
one has both Skill 1 and Skill 2). 

[[Figure 5 about here]] 

The observables, x, are the actual test itenas. The ovular nodes representing items 
are children of nodes that represent the naininaal necessary conjunction of skills necessary to 
solve that item if one uses Method B. The relationship between such a node and an item is 
probabilistic, indicating false positive and false negative probabilities. 

Cognitive tiieory inspired the structure of this network. Initial estimates of the 
numerical values of conditional probability relationships were approximated using results 
from Tatsuoka's (1983) ''rule space" of the data, with only students she classified as 
Method B users. That is. Dr. Tatsuoka's estimate of whether a student did or did not 
possess Skill 1 and Skill2 were taken as trutii, and our probabilities of students having 
Skill 1, of having Skill2 given that they did or didn't have Skill2, and so on, are empirical 
proportions from this data set. (Duanli Yan and I are exploring the estimation of these 
conditional probabilities using the EM algorithm of Dempster, Laird, & Rubin, 1977.) 
Table 5 gives three examples of the conditional probabiUties matrices we used as input to 
HUGIN and ERGO: 

• Skill2 given Skill 1 . These are the conditional probabilities of having or not having 
Skill2, given that a student does or does not have Skill 1. These were approximated 
from the results of Dr. Tatsuoka's analysis, as described above. 

Skills 1&2 given Skill 1 and Skill2. This is a logical relationship, indicating that a 
student has the conjunction of Skills 1 and 2 if and only if she has both Skill 1 and 
Skill2. 

o ' ^ 
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• Itenil2 given Skillsl&2. This matrix gives the probabilities of correctly answering 
Item 12, given that a student does or does not have the requisite set of skills under 
Method B. For the row in which Skills 1&2 is true, we have the true positive and 
false negative success rates, .895 and .105 respectively. For the row in which 
Skills 1&2 is false, we have the false positive and true negative rates, .452 and 
.548. (A relatively high false positive rate such as this often occur when an item on 
a test has appeared as a textbook example or homework exercise.) 

[[Table 5 about here]] 

Figure 6 presents the join tree for the DAG depicted in Figure 5. Figure 7 depicts 
base rate probabilities of skill possession and item percents-correct in the network with 
empirical associations, using the conditional probabilities from Tatsuoka's Rule Space 
analysis. This represents the state of knowledge one has about a student knowing that she 
uses Method B, but without having observed any item responses. Figure 8 shows how 
behefs are changed after observing mostly correct answers to items requiring 
subprocedures other than Skill 2, but missing most of those that do require it. The base- 
rate and the updated probabilities for the five skills shown in Table 6 show substantial 
shifts toward the belief that the student commands Skills 1, 3, 4, and possibly 5, but 
almost certainly not Skill 2. 

[[Figures 6-8 about here]] 

[[Table 6 about here]] 

A simplified network for Method B 

An alternative representation exemplifies the tradeoffs one faces when building 
more complex networks, and illustrates their relationships to the network building and 
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manipulation steps discussed above. A simpler network results if empirical relationships 
among skills are deleted, as shown in Figure 9. The resulting join tree is shown in Figure 
10. The advantage of this simpler network is a join-tree with smaller maximally-sized 
clique, containing 4 variables rather than 5. The largest potential table has only 16 entries, 
rather than 48. By such simplifications, larger networks of variables can be updated in the 
same amount of calculating time. The simpler network uses only direct information from 
item responses to update beliefs about skill possession; that is, belief for Skill 3 is changed 
only by responses to items that require Skill 3. The tradeoff is the forfeiture of indirea 
information. Suppose we have ascenained that students who possess Skills 1 and 2 
usually also possess Skill 3. The full network, incorporating this link, would revise our 
belief about Skill 3 in response to indirect evidence in the form of correct answers to items 
requiring Skills 1 and 2. The simplified network, omitting the link, would not revise belief 
about Skill 3 without direct evidence, or responses to items requiring Skill 3 itself. 

[[Figures 9 & 10 about here]] 

What kinds of inferential errors result from this simplification? Closed-form results 
with simple models indicate that ignoring positive relationships among unobservable 
variables higher in the network can lead to weaker, or more conservative., revision of belief 
about them from observations. This may be an acceptable price in some cases in return for 
being able to incorporate more variables into a network. (On the other hand, ignoring 
dependencies among observable variables can lead to overly strong updating — generally a 
more costiy error.) A second rationale for omitting the empirical relationships among 
skills is that the resulting model, while conservative for a given population, may be more 
transponable to other populations — for example, students who studied fractions under a 
different curriculum. While the skill requirements of items may be fairly consistent over 
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students, the relationships among skills may depend more heavily on the order and 
intensity with which they are studied. 

A simultaneous network for both methods 

'We built a similar network for Method A. Figure 1 1 incorporates it and the Method 

B network into a single network that is appropriate when we don't know which strategy a 

student is using. Each item now has three parents: nninimally sufficient sets of 

subprocedures under Method A and under Method B, and the new node "Is the student 
using Method A or Method B?" An item like 7| - 5^ is hard under Method A but easy 

under Method B. An item like 2i - 1| is just the opposite. A response vector with most of 

the first type of items right and the second types wrong shifts belief toward the use of 
Method B, while the opposite pattern shifts belief toward the use of Method A. A pattern 
with mosdy wrong answers gives posterior probabilities for Method A and Method B that 
are about the same as the base rate, but low probabilities for possessing any of the skills. 
We haven't learned much about which strategy such a student is "sing, but we do have 
evidence that he probably doesn't have subprocedure skills. Similarly, a pattern with 
mosdy right answers again gives posterior probabilities for Method A and Method B that 
are about the same as the base rate, but high probabilities for possessing all of the skills. In 
any of these cases, the results could be used to guide an instructional decision. 

[[Figure 1 1 about here-network for both methods]] 

Extensions 

This example could be extended in many ways, both as to the nature of the 
observations and the nature of the student model. With the present student model, one 
might explore additional sources of evidence about strategy use: monitoring response 
times, tracing solution steps, or simply asking the students to describe their solutions! 
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Each has tradeoffs in terms of cost and evidential value, and each could be sensible in some 
applications but not others. An important extension of the student model would be to allow 
for strategy switching (Kyllonen, Lohman, & Snow, 1984). Adults, for example, often 
decide whether to use Method A or Method B for a given item only after gauging which 
would be easier to apply. The variables in this more complex student model would express 
the tendencies of a student to employ various strategies under various conditions; students 
would then be mixtures in and of themseh es, with "always use Method A'* and "always 
use Method B" as extreme cases. Mixture problems are notoriously hard statistical 
problems; carrying out inference in the context of this more ambitious student model would 
certainly require the richer information mentioned above. Anne Beland and I (Beland & 
Mislevy, 1992) tackled this problem in the domain of proportional reasoning, addressing 
students' solutions to balance-beam tasks. We modeled students in terms of neo-Piagetian 
developmental stages based on the availability of certain concepts that could be fashioned 
into strategies for different kinds of tasks. The data for inferring a students' stages were 
their solutions and their explanations of the strategies they employed. 

Conclusion 



Inference network models can play useful roles in educational assessment. One is 
the use mentioned in our example, namely, cognitive diagnosis for short term instructional 
guidance as in an intelligent tutoring system (ITS). At ETS, we are currentiy working to 
implement probability-based inference updating the student model in an aircraft hydraulics 
ITS (Gitomer, Steinberg, & Mislevy, in press). Another is mapping out the evidential 
structure of observations and student knowledge structures (Haertel, 1989; Haertel & 
Wiley, 1993). As both models and observational contexts become more complex, more 
careful thought is required to son out and characterize the implications and qualities of 
assessment tasks if we are to use the information effectively. We plan to explore the kinds 
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of problems in which the approach outlined above provfis efficacious, and to develop 
exemplars and methodological tools for employing it. 
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Notes 

1 This terminology is from the use of DAGs in pedigree analysis, where nodes represent 
characteristics of animals that are in fact parents and children. 

•\ 

- Panial information, such as ^'based on a reading from an unreliable thermometer, I'd 
place the probability of fever is .80," would lead to proportional re-adjustment of the 
columns, maintaining the proponional relationships mthin columns. 
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Table 1 

Conditional Probabilities of Symptoms Given Disease States 



FLU 


THRINF 


P(SORETHR)=yes 


P(SORETHR)=no 




y t-o 


.91 


.09 


yes 


no 


OS 


95 


no 


yes 


.90 


.10 


no 


no 


.01 


.99 


FLU 


THRINF 


P(FEV)=yes 


P(FEV)=no 


yes 


yes 


.99 


.01 


yes 


no 


.90 


.10 


no 


yes 


.90 


.10 


no 


no 


.01 


.99 
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Table 2 

Potential Tables for Initial Status of Knowledge 



Clique 1 



FLU THRINF 


FEVER: yes FEVER: no 


yes yes 
yes no 
no yes 

no no 


.012 .000 
.088 .010 
.088 .010 
.008 .784 






FLU THRINF 


Probability 


yes yes 
yes no 
no yes 
no no 


.012 
.098 
.098 
.792 


Clique 2 




FLU THRINF 


SORTHR: yes SORTHR: no 


yes yes 
yes no 
no yes 
no no 


.011 .001 
.005 .093 
.088 .010 
.008 .784 
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Table 3 

Potential Tables af wf "FEVER=yes" 



Clique 1 


FLU 


THRINF 


FEVER: yes 


FEVER: no 


yes 


yes 


.012 


0 


yes 


no 


.088 


0 


no 


yes 


.088 


0 


no 


no 


.008 


0 



FLU 


THRINF 


Probability 


yes 


yes 


.012 


yes 


no 


.088 


no 


yes 


.088 


no 


no 


.008 







Clique 2 



FLU 


THRINF 


SORTHR: 


yes 


SORTHR: no 


yes 


yes 


.011 




.001 


yes 


no 


.004 




.084 


no 


yes 


.080 




.009 


no 


no 


.000 




.008 


Re-Normed Table for Clique 2 


FLU 


THRINF 


SORTHR: 


yes 


SORTHR: no 


yes 


yes 


.059 




.005 


yes 


no 


.020 




.426 


no 


yes 


.406 




.046 


no 


no 


.000 




.041 



Table 4 

Skill Requirements for Fractions Items 



If Method A used If Method B used 



item Tf 




1 


2 


5 


6 


7 


2 


3 


4 


5 




^2 2 


Y 










X 


X 


X 




6 


6 4 — 


X 


















7 


a 0 1 — 
3-25- 


X 




X 


X 




X 


X 


X 


X 


8 


3 3 ^ 

4 8 " 


X 


















9 




X 


X 


X 


X 


X 




X 






10 


/I 4 0 7 — 
^12 ^12"" 


X 


X 




X 




X 


X 


X 




1 1 




X 


X 




X 




V 


Y 


Y 




12 


11 1 - 
8 8 


X 


X 








X 








14 




X 






X 






X 






15 


^ 3 


X 


X 


X 








X 


X 


X 


16 




X 


X 




X 






X 






17 


73 _ 4 
'5 5 


X 


X 




X 






X 


X 




18 


^10 ^10 


X 


X 




X 


X 


X 


X 


X 




19 


7~lf = 


X 


X 


X 


X 


X 


X 


X 


X 


X 


20 


^3 ^3 


X 


X 




X 


X 


X 


X 


X 





Skills: 

1 . Basic fraction subtraction 

2. Simplify/Reduce 

3 . Separate whole number from fraction 

4. Borrow one from whole number to fraction 

5 . Convert whole number to fraction 

6. Convert mixed number to fraction 

7. Column borrow in subtraction 



Table. 5 

Examples of Conditional Probability Matrices for Method B Network 



Skill2 given Skill 1 



Skill 2 Probabilities 



Skill 1 Status 



Yes 



No 



Yes 
No 



.662 
.289 



.338 
.711 



Skillsl&2 given Skilll. Skill2 



Skill 1 Status Skill 2 Status 



Skills 1&2 Probabilities 



Yes 



No 



Yes 
Yes 
No 
No 



Yes 
No 
Yes 
No 



1 
0 
0 
0 



0 

1 
1 
1 



Iteml2 given Skillsl&2 

Skills 1&2 Status 
Yes 
No 



Item 12 Probabilities 



Correct 
.895 
.452 



Incorrect 
.105 
.548 
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Table 6 

Prior and Posterior Probabilities of Subprocedure Profile 





rnor r^iuuduxiiLy 


Pn^ tpri or PrnhahiHtv 


1 


• OO J 


QQQ 


I 


AT 8 
.O iO 




3 


.7 J / 


995 


A 

4 






c 

J 




.561 


1 &2 


.585 


.056 


1 &3 


.853 


.994 


1, 3, &4 


.392 


.702 


1,2, 3,&4 


.335 


.007 


1, 3, 4, & 5 


.223 


.492 


U 2, 3, 4, & 5 


.200 


.003 
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Figure 1 

Directed Acyclic Graph Representation 



ERIC 
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Figure 2 

Undirected, Triangulated Graph Representation 



ERIC 
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Clique 1 : FEVER, FLU, THRINF Clique 2: FLU, THRINF, SORTHR 



Figure 3 
Clique Structure 
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FEUER, 

FLU, 
THRINF 



Clique 1 




Clique 
intersection 



FLU, 
THRINF, 
SORTHR 



Clique 2 



Figure 4 
Join Tree Representation 
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Basic fraction 
subtraction 
(Skill 1) 



Simplify/reduce 
(Skill 2) 



6/7 aTT) Item 6 
2/3-2/33 Item 8 



Mixed number 
skills 



Separate whole 
number from 
fraction 
(Skill 3) 





Borrow from 




whole number 




(Skill 4) 



It em 4 ^ Item 1 1 Ite m 20 

(44/12 . 2 7/12) (4 i/io- 2 8/10) 
Item 10 Item 18 




Convert whole 
number to 
fraction 
(Skill 5) 




Skills 1, 3. 4, 


& 5 






( 2- 


1/3 ) 



Item 15 



C 3^2 i/s )( 4T34/3 ) 
Item 7 Item 19 



Figure 5 



An Inference Network for Method B 



BEST COPY AVAILABLE 



S13, ltem9 



S13,MN, \ 
S4,S154 j 



( \ 
!S12, Itemizi 



SI 345, ^ 



Item15 



S1,S3,MN J 



S1,S13,MN, 
S134 




S1,S2.S13, 
MN,S134,S5 



S1,S2,S12, 
S134,S5 



S12,S134, 
S1345,S5 



S12,S134, 
S1345, 
SI 234 



S1234. 
SI 345, 
S12345 



SI 2345, 
Item? 



J 



S13, Iteml4| 



SI, Iteme 



S134 
I tern 17^' 



^ SI 234, 
ItemlS J 



S1234, 
Item4 



S12347~^l 
Itemll J 



S13, Iteml6 



SI, Items 




SI 234, 
ItemlB J 



SI 234, "i 
ltem2B J 



SI 2345, ^, 
Iteml9 j 



Figure 6 

Join Tree for Network with Empirical Connections Annong Skills 
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itemie 



ItemIS 



Item? ItemlQ 

Nolc: Bars represent probabilities, summing lo one for all the possible values of a variable. 

Figure 7 

Initial Probabilities for Method B Network 
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A4W 

m 

ttemie 



Itemi 8 



Item? Itemi 9 



Noic: Bars rcprcscni probabilities, summing lo one for all the possible values of a variable. A 
shaded bar extending to one represents cenaint>', due to having observed value of that variable. 

Figure 8 

Posterior Probabiliries for Method B Following Item Responses 
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Basic fraction 
subtraction 
(Skill 1) 



6/7-4/^7 3 Item 6 
2/3 > 2/3 ) Item 8 



Simplify/reduce 
(Skill 2) 



Convert whole 
number to 
firaction 
(Skill 5) 




Separate whole 
number from 
fraction 
(Skill 3) 



Skills 1 & 3 



Skills 1 & 2 



C 3 7/8^ 



Borrow from 
whole number 
(Skill 4) 








Item 9 ^3 ^^3^3 Item 16 
Item 14 



Skills 1. 3. & 
4 




( 11/8 - 1/8 ) 
Item 12 



Skills 
1.2.3,&4 



(73/5. 4/5 3 
Item 17 



Skills 1. 3. 4. 
&5 




(3 1/2-2 3/2) / C4l/3-2 4'H) \ 1^ 1 5/3) 
Item 4 ^ Item 11 ^ Item 20 
(44/12 -2 7/12) (4 1/10-2 8/10) 
Item 10 Hem 18 




C 2.1/3 ) 
Item 15 



C 3-2 1/5 )C 4>3 4/3 
Item 7 Item 19 



Figure 9 

A Reduced Inference Network for Method B 



S3,S4,MN 




S12, Item12 



f S1345. \ 
Item! 5 J 



S3,S4 
S13 



iS154 J 



S12,S134 
SI 234 



S134, 
S1234, 
SI 345 



S1234, 
S1345, 
SI 2345 



I 



r 



SI 2345, 
Item? 



J 



r 



S12345, ! 
Item19 J 



' — ^ 

S13, ItemQj 



S1,S3,S13,1 

m4_J \^ 



S13, itemHi- 



SI 3, Itemiei 




SI 234, 
ItemlS 



^51234, 1 



ItemlB J 



SI 234, ^ 
\^ ltem2B J 



S1234, 5 
^ Itemll J 



Figure 10 

Join Tree for Network without Empirical Connections Among Skills 
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